-
Notifications
You must be signed in to change notification settings - Fork 253
Replace otel.sdk.span.ended metric with improved span.started metric #2431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace otel.sdk.span.ended metric with improved span.started metric #2431
Conversation
Do you have a java example/prototype I can see? |
@dashpole here is the PR for the java implementation. However, there we in fact do record I also just noticed that it in fact isn't really possible to return a noop-singleton span implementation even for non-recording spans: Even those spans need to correctly return their This and your comment got me thinking: Why not just ditch all the WDYT? |
I believe that is explicitly prohibited by the spec: https://github.com/open-telemetry/opentelemetry-specification/blob/445a6835d755c1913a9acaa227997952ac9b4ea2/specification/trace/api.md?plain=1#L734. I don't think your implementation is permitted by the spec.
|
Oh wow, great find! We totally overlooked this when initially defining the metrics. I'd then suggest to do the following:
This way the metrics are simple to define and don't have that wierdness that The number of ended spans can still be derived via EDIT: After reading the spec again, I don't think my implementation violates the spec. The spec you linked is about So I think there are two paths forward:
I would prefer (a), but I don't have a strong opinion. At least in Java it's easy to implement as shown by my PR and gives the data one less edge case. Also I think the performance overhead of the |
I think that might work. The one thing i'm not sure of is whether we needed spans.ended to help make sense of the processor metrics. It came from #1631 (comment), which was a suggestion by @jmacd. @jmacd do you have any concerns with measuring spans.started instead of spans.ended? |
I agree there is a use-case to track live spans. The question is whether SDKs can/should implement it. It looks like SDKs MAY use a NonRecordingSpan described in the API when a decision not to record is made: https://github.com/open-telemetry/opentelemetry-specification/blob/445a6835d755c1913a9acaa227997952ac9b4ea2/specification/trace/sdk.md#sdk-span-creation
If they do that, they won't be able to implement the live metric. There is a bit of gray area as well. Go doesn't explicitly use the NonRecordingSpan from the API, since the type is kept private, but instead defines a different private nonRecordingSpan in the SDK. I would assume the non-recording spec applies to the copy we have in our SDK, but you could make the case it shouldn't apply. In general, i'm still concerned with changing a call that is expected to be on the hot path, and is currently a no-op (ending a non-recording span), and changing it to a call to the metrics API (including constructing attributes, etc). I prefer (b) |
We wanted the ability to have both the number of started and ended spans, but decided to collect just one of them because they can be derived from each other with the help of E.g.
Then let's just go with (b). It's easier to add spans.live for non-recording spans later than removing it from the spec if we ever feel the need for it. |
Resolve open-telemetry#7030 This generation fixes the following: - The `semconv/weaver.yaml` file is moved to the `semconv/templates/registry/go` to [fix the generation](open-telemetry#7030 (comment)). - Deprecated enum `var`s are no longer generate. - The deprecation of the previous `OSTypeZOS` (`z_os`) conflicts with the replacement (`zos`). - This matches all other generations where deprecated types are not generated and the migration docs identify users need to upgrade when upgrading packages. - Fix metric description rendering to handle multi-line metric descriptions. - Fix filter of deprecated metrics in weaver configuration (see open-telemetry/weaver#847). The release notes for v1.35.0 are included as we do not plan to add a package for that release and it includes breaking changes being released here (i.e. `az.namespace` and `az.service_request_id`) ## [`v1.36.0` semantic convention release notes](https://github.com/open-telemetry/semantic-conventions/releases/tag/v1.36.0) <div data-pjax="true" data-test-selector="body-content" data-view-component="true" class="markdown-body my-3"><h3>🚩 Deprecations 🚩</h3> <ul> <li><code>os</code>: Adds the 'deprecated:' tag/attribute to the <code>z_os</code> enum value of <code>os.type</code>. This value was recently deprecated in v1.35.0. (<a href="open-telemetry/semantic-conventions#2479" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/2479/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2479</a>)</li> </ul> <h3>💡 Enhancements 💡</h3> <ul> <li><code>otel</code>: Replaces <code>otel.sdk.span.ended</code> with <code>otel.sdk.span.started</code> and allow differentiation based on the parent span origin (<a href="open-telemetry/semantic-conventions#2431" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/2431/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2431</a>)</li> <li><code>db</code>: Add database context propagation via <code>SET CONTEXT_INFO</code> for SQL Server (<a href="open-telemetry/semantic-conventions#2162" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2162/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2162</a>)</li> <li><code>entities</code>: Adds support for Entity registry and Entity stabilization policies. (<a href="open-telemetry/semantic-conventions#2246" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2246/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2246</a>)</li> </ul> <h3>🧰 Bug fixes 🧰</h3> <ul> <li><code>cloud</code>: Exclude deprecated Azure members from code generation on the <code>cloud.platform</code> attribute (<a href="open-telemetry/semantic-conventions#2477" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/2477/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2477</a>, <a href="open-telemetry/semantic-conventions#2455" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2455/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2455</a>)</li> </ul> ## [`v1.35.0` semantic convention release notes](https://github.com/open-telemetry/semantic-conventions/releases/tag/v1.35.0) <div data-pjax="true" data-test-selector="body-content" data-view-component="true" class="markdown-body my-3"><h3>🛑 Breaking changes 🛑</h3> <ul> <li> <p><code>azure</code>: Align azure events, attributes, and enum members with general naming guidelines. (<a href="open-telemetry/semantic-conventions#608" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/608/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#608</a>, <a href="open-telemetry/semantic-conventions#1708" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/1708/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1708</a>, <a href="open-telemetry/semantic-conventions#1698" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/1698/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1698</a>)</p> <ul> <li>Renamed attributes: <ul> <li><code>az.service_request_id</code> to <code>azure.service.request.id</code></li> <li><code>az.namespace</code> to <code>azure.resource_provider.namespace</code></li> </ul> </li> <li>Renamed events: <ul> <li><code>az.resource.log</code> to <code>azure.resource.log</code></li> </ul> </li> <li>Renamed enum members: <ul> <li><code>az.ai.inference</code> to <code>azure.ai.inference</code> (on <code>gen_ai.system</code>)</li> <li><code>az.ai.openai</code> to <code>azure.ai.openai</code> (on <code>gen_ai.system</code>)</li> <li><code>azure_aks</code> to <code>azure.aks</code> (on <code>cloud.platform</code>)</li> <li><code>azure_app_service</code> to <code>azure.app_service</code> (on <code>cloud.platform</code>)</li> <li><code>azure_container_apps</code> to <code>azure.container_apps</code> (on <code>cloud.platform</code>)</li> <li><code>azure_container_instances</code> to <code>azure.container_instances</code> (on <code>cloud.platform</code>)</li> <li><code>azure_functions</code> to <code>azure.functions</code> (on <code>cloud.platform</code>)</li> <li><code>azure_openshift</code> to <code>azure.open_shift</code> (on <code>cloud.platform</code>)</li> <li><code>azure_vm</code> to <code>azure.vm</code> (on <code>cloud.platform</code>)</li> </ul> </li> </ul> </li> <li> <p><code>system</code>: Revert the change that moved <code>system.cpu.*</code> to <code>cpu.*</code>. The 3 affected metrics are back in <code>system.cpu.*</code>. (<a href="open-telemetry/semantic-conventions#1873" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/1873/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1873</a>)</p> </li> <li> <p><code>system</code>: Changes system.network.connections to system.network.connection.count (<a href="open-telemetry/semantic-conventions#1800" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/1800/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1800</a>)</p> </li> <li> <p><code>k8s</code>: Change instrument type for .limit/.request container metrics from gauge to updowncounter (<a href="open-telemetry/semantic-conventions#2354" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/2354/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2354</a>)</p> </li> </ul> <h3>🚩 Deprecations 🚩</h3> <ul> <li><code>os</code>: Deprecate os.type='z_os' and replace with os.type='zos' (<a href="open-telemetry/semantic-conventions#1687" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/1687/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1687</a>)</li> </ul> <h3>🚀 New components 🚀</h3> <ul> <li><code>mainframe, zos</code>: Add initial semantic conventions for mainframes (<a href="open-telemetry/semantic-conventions#1687" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/1687/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1687</a>)</li> </ul> <h3>💡 Enhancements 💡</h3> <ul> <li> <p><code>dotnet</code>: Define .NET-specific network spans for DNS resolution, TLS handshake, and socket connections, along with HTTP-level spans to (optionally) record relationships between HTTP requests and connections.<br> (<a href="open-telemetry/semantic-conventions#1192" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/1192/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1192</a>)</p> </li> <li> <p><code>k8s</code>: Add <code>k8s.node.allocatable.cpu</code>, <code>k8s.node.allocatable.ephemeral_storage</code>, <code>k8s.node.allocatable.memory</code>, <code>k8s.node.allocatable.pods</code> metrics (<a href="open-telemetry/semantic-conventions#2243" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2243/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2243</a>)</p> </li> <li> <p><code>k8s</code>: Add k8s.container.restart.count metric (<a href="open-telemetry/semantic-conventions#2191" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/2191/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2191</a>)</p> </li> <li> <p><code>k8s</code>: Add K8s container resource related metrics (<a href="open-telemetry/semantic-conventions#2074" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2074/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2074</a>)</p> </li> <li> <p><code>k8s</code>: Add k8s.container.ready metric (<a href="open-telemetry/semantic-conventions#2074" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2074/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2074</a>)</p> </li> <li> <p><code>k8s</code>: Add k8s.node.condition metric (<a href="open-telemetry/semantic-conventions#2077" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2077/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2077</a>)</p> </li> <li> <p><code>k8s</code>: Add k8s resource quota metrics (<a href="open-telemetry/semantic-conventions#2076" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2076/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2076</a>)</p> </li> <li> <p><code>events</code>: Update general event guidance to allow complex attributes on events and use them instead of the body fields.<br> (<a href="open-telemetry/semantic-conventions#1651" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/1651/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1651</a>, <a href="open-telemetry/semantic-conventions#1669" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/1669/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1669</a>)</p> </li> <li> <p><code>k8s</code>: Add k8s.container.status.state and k8s.container.status.reason metrics (<a href="open-telemetry/semantic-conventions#1672" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/1672/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#1672</a>)</p> </li> <li> <p><code>feature_flags</code>: Mark feature flag semantic convention as release candidate. (<a href="open-telemetry/semantic-conventions#2277" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/2277/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2277</a>)</p> </li> <li> <p><code>k8s</code>: Add new resource attributes for <code>k8s.hpa</code> to capture the <code>scaleTargetRef</code> fields (<a href="open-telemetry/semantic-conventions#2008" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2008/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2008</a>)<br> Adds below attributes to the <code>k8s.hpa</code> resource:</p> <ul> <li><code>k8s.hpa.scaletargetref.kind</code></li> <li><code>k8s.hpa.scaletargetref.name</code></li> <li><code>k8s.hpa.scaletargetref.api_version</code></li> </ul> </li> <li> <p><code>k8s</code>: Adds metrics and attributes to track k8s HPA's metric target values for CPU resources. (<a href="open-telemetry/semantic-conventions#2182" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2182/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2182</a>)<br> Below metrics are introduced to provide insight into HPA scaling configuration for CPU.</p> <ul> <li><code>k8s.hpa.metric.target.cpu.value</code></li> <li><code>k8s.hpa.metric.target.cpu.average_value</code></li> <li><code>k8s.hpa.metric.target.cpu.average_utilization</code></li> </ul> </li> <li> <p><code>k8s</code>: Explains the rationale behind the Kubernetes resource naming convention. (<a href="open-telemetry/semantic-conventions#2245" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2245/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2245</a>)</p> </li> <li> <p><code>all</code>: Adds modelling guide for resource and entity. (<a href="open-telemetry/semantic-conventions#2246" data-hovercard-type="issue" data-hovercard-url="/open-telemetry/semantic-conventions/issues/2246/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2246</a>)</p> </li> <li> <p><code>service</code>: Adds stability policies for Entity groups. (<a href="open-telemetry/semantic-conventions#2378" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/2378/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2378</a>)<br> Entity groups now require <code>role</code> to be filled for attributes.</p> </li> <li> <p><code>otel</code>: Specifies component.type values for Zipkin and Prometheus exporters (<a href="open-telemetry/semantic-conventions#2229" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/2229/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2229</a>)</p> </li> </ul> <h3>🧰 Bug fixes 🧰</h3> <ul> <li><code>otel</code>: Removes <code>otel.scope</code> entity. (<a href="open-telemetry/semantic-conventions#2310" data-hovercard-type="pull_request" data-hovercard-url="/open-telemetry/semantic-conventions/pull/2310/hovercard" aria-keyshortcuts="Alt+ArrowUp">open-telemetry#2310</a>)</li> </ul>
Changes
Enforce recording of
otel.sdk.span.ended
otel.sdk.span.started
With the previous wording in the spec it was permissible for SDKs to not collect
otel.sdk.span.ended
for spans with recordingfalse
. This was initially added due to performance and implementation complexity concerns IIRC, because e.g. SDKs tend to return no-op spans for non-recording spans.This has the down side, that
otel.sdk.span.ended
can't really be trusted to get the total number of spans and subsequently deriving the effective sampling rate, which is a very valuable insight.Therefore this PR changes the wording so thatotel.sdk.span.ended
MUST always be recorded, while still allowing implementations to make use of no-op spans: For non-recording spans, implementations may now recordotel.sdk.span.ended
already when the span is started and omit recordingotel.sdk.span.live
. This means thatotel.sdk.span.ended
can simply be recorded from the tracer starting the span and the span implementation can remain stateless because it doesn't have to track theend()
call.I think the initial request for this more lenient behavior for non-recording spans came from @dashpole while implementing it for go, but in the end wasn't even necessary. @dashpole could you maybe have a look and confirm that this new wording would be fine for you?Initially, this PR proposed to enforce the recording of
otel.sdk.span.ended
, even for non-recording spans.Because of implementation and overhead concerns (see #2431 (comment)) we don't want to force implementations to track
end()
for non-recording calls. In order to track the sampling rate correctly, this would mean non-recording spans would need to increaseotel.sdk.span.ended
when they are started, which would be very counterintuitive and complicates the spec.This PR now resolves this by replacing
otel.sdk.span.ended
withotel.sdk.span.started
. While this is technically a breaking change, it doesn't change the insights gained from the data:spans.ended
can be derived fromspans.started
.This makes the spec clearer and simpler and the metrics do what their name suggests.
In addition we decided for consistency reasons across SDKs to not allow
spans.live
to be collected for non-recording spans for now. This restriction can be lifted in the future if deemed necessary.Add
otel.span.parent.origin
attribute tootel.sdk.span.started
The
otel.span.parent.origin
was introduced to allow to differentiate the metric based on whether a span has a parent and if that parent is remote. This can be trivially implemented via [SpanContext.IsRemote]8https://opentelemetry.io/docs/specs/otel/trace/api/#isremote).At very little cost, this makes the insights that can be derived from
otel.sdk.span.started
much richer:I've opted to go for a single
otel.span.parent.origin
attribute (naming up for discussion as usual) to represent whether a span is a root and span and to distinguish its parent being remote or not. I did consider using two boolean flags instead, but this would allow representation of invalid states (e.g. a span is a root but has been marked as it's parent being remote) and also would be inconsistent withotel.span.sampling_result
.Merge requirement checklist
[chore]
schema-next.yaml updated with changes to existing conventions.